Serveur d'exploration sur l'OCR

Attention, ce site est en cours de développement !
Attention, site généré par des moyens informatiques à partir de corpus bruts.
Les informations ne sont donc pas validées.

Robust Keyword Retrieval Method for OCRed Text

Identifieur interne : 000547 ( Main/Exploration ); précédent : 000546; suivant : 000548

Robust Keyword Retrieval Method for OCRed Text

Auteurs : Yusaku Fujii [Japon] ; Hiroaki Takebe [Japon] ; Hiroshi Tanaka [Japon] ; Yoshinobu Hotta [Japon]

Source :

RBID : Pascal:11-0279163

Descripteurs français

English descriptors

Abstract

Document management systems have become important because of the growing popularity of electronic filing of documents and scanning of books, magazines, manuals, etc., through a scanner or a digital camera, for storage or reading on a PC or an electronic book. Text information acquired by optical character recognition (OCR) is usually added to the electronic documents for document retrieval. Since texts generated by OCR generally include character recognition errors, robust retrieval methods have been introduced to overcome this problem. In this paper, we propose a retrieval method that is robust against both character segmentation and recognition errors. In the proposed method, the insertion of noise characters and dropping of characters in the keyword retrieval enables robustness against character segmentation errors, and character substitution in the keyword of the recognition candidate for each character in OCR or any other character enables robustness against character recognition errors. The recall rate of the proposed method was 15% higher than that of the conventional method. However, the precision rate was 64% lower.


Affiliations:


Links toward previous steps (curation, corpus...)


Le document en format XML

<record>
<TEI>
<teiHeader>
<fileDesc>
<titleStmt>
<title xml:lang="en" level="a">Robust Keyword Retrieval Method for OCRed Text</title>
<author>
<name sortKey="Fujii, Yusaku" sort="Fujii, Yusaku" uniqKey="Fujii Y" first="Yusaku" last="Fujii">Yusaku Fujii</name>
<affiliation wicri:level="1">
<inist:fA14 i1="01">
<s1>FUJITSU LABORATORIES LTD., 1-1 Kamikodanaka 4-chome</s1>
<s2>Nakahara-ku, Kawasaki</s2>
<s3>JPN</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
<sZ>3 aut.</sZ>
<sZ>4 aut.</sZ>
</inist:fA14>
<country>Japon</country>
<wicri:noRegion>Nakahara-ku, Kawasaki</wicri:noRegion>
</affiliation>
</author>
<author>
<name sortKey="Takebe, Hiroaki" sort="Takebe, Hiroaki" uniqKey="Takebe H" first="Hiroaki" last="Takebe">Hiroaki Takebe</name>
<affiliation wicri:level="1">
<inist:fA14 i1="01">
<s1>FUJITSU LABORATORIES LTD., 1-1 Kamikodanaka 4-chome</s1>
<s2>Nakahara-ku, Kawasaki</s2>
<s3>JPN</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
<sZ>3 aut.</sZ>
<sZ>4 aut.</sZ>
</inist:fA14>
<country>Japon</country>
<wicri:noRegion>Nakahara-ku, Kawasaki</wicri:noRegion>
</affiliation>
</author>
<author>
<name sortKey="Tanaka, Hiroshi" sort="Tanaka, Hiroshi" uniqKey="Tanaka H" first="Hiroshi" last="Tanaka">Hiroshi Tanaka</name>
<affiliation wicri:level="1">
<inist:fA14 i1="01">
<s1>FUJITSU LABORATORIES LTD., 1-1 Kamikodanaka 4-chome</s1>
<s2>Nakahara-ku, Kawasaki</s2>
<s3>JPN</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
<sZ>3 aut.</sZ>
<sZ>4 aut.</sZ>
</inist:fA14>
<country>Japon</country>
<wicri:noRegion>Nakahara-ku, Kawasaki</wicri:noRegion>
</affiliation>
</author>
<author>
<name sortKey="Hotta, Yoshinobu" sort="Hotta, Yoshinobu" uniqKey="Hotta Y" first="Yoshinobu" last="Hotta">Yoshinobu Hotta</name>
<affiliation wicri:level="1">
<inist:fA14 i1="01">
<s1>FUJITSU LABORATORIES LTD., 1-1 Kamikodanaka 4-chome</s1>
<s2>Nakahara-ku, Kawasaki</s2>
<s3>JPN</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
<sZ>3 aut.</sZ>
<sZ>4 aut.</sZ>
</inist:fA14>
<country>Japon</country>
<wicri:noRegion>Nakahara-ku, Kawasaki</wicri:noRegion>
</affiliation>
</author>
</titleStmt>
<publicationStmt>
<idno type="wicri:source">INIST</idno>
<idno type="inist">11-0279163</idno>
<date when="2011">2011</date>
<idno type="stanalyst">PASCAL 11-0279163 INIST</idno>
<idno type="RBID">Pascal:11-0279163</idno>
<idno type="wicri:Area/PascalFrancis/Corpus">000132</idno>
<idno type="wicri:Area/PascalFrancis/Curation">000641</idno>
<idno type="wicri:Area/PascalFrancis/Checkpoint">000101</idno>
<idno type="wicri:doubleKey">0277-786X:2011:Fujii Y:robust:keyword:retrieval</idno>
<idno type="wicri:Area/Main/Merge">000553</idno>
<idno type="wicri:Area/Main/Curation">000547</idno>
<idno type="wicri:Area/Main/Exploration">000547</idno>
</publicationStmt>
<sourceDesc>
<biblStruct>
<analytic>
<title xml:lang="en" level="a">Robust Keyword Retrieval Method for OCRed Text</title>
<author>
<name sortKey="Fujii, Yusaku" sort="Fujii, Yusaku" uniqKey="Fujii Y" first="Yusaku" last="Fujii">Yusaku Fujii</name>
<affiliation wicri:level="1">
<inist:fA14 i1="01">
<s1>FUJITSU LABORATORIES LTD., 1-1 Kamikodanaka 4-chome</s1>
<s2>Nakahara-ku, Kawasaki</s2>
<s3>JPN</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
<sZ>3 aut.</sZ>
<sZ>4 aut.</sZ>
</inist:fA14>
<country>Japon</country>
<wicri:noRegion>Nakahara-ku, Kawasaki</wicri:noRegion>
</affiliation>
</author>
<author>
<name sortKey="Takebe, Hiroaki" sort="Takebe, Hiroaki" uniqKey="Takebe H" first="Hiroaki" last="Takebe">Hiroaki Takebe</name>
<affiliation wicri:level="1">
<inist:fA14 i1="01">
<s1>FUJITSU LABORATORIES LTD., 1-1 Kamikodanaka 4-chome</s1>
<s2>Nakahara-ku, Kawasaki</s2>
<s3>JPN</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
<sZ>3 aut.</sZ>
<sZ>4 aut.</sZ>
</inist:fA14>
<country>Japon</country>
<wicri:noRegion>Nakahara-ku, Kawasaki</wicri:noRegion>
</affiliation>
</author>
<author>
<name sortKey="Tanaka, Hiroshi" sort="Tanaka, Hiroshi" uniqKey="Tanaka H" first="Hiroshi" last="Tanaka">Hiroshi Tanaka</name>
<affiliation wicri:level="1">
<inist:fA14 i1="01">
<s1>FUJITSU LABORATORIES LTD., 1-1 Kamikodanaka 4-chome</s1>
<s2>Nakahara-ku, Kawasaki</s2>
<s3>JPN</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
<sZ>3 aut.</sZ>
<sZ>4 aut.</sZ>
</inist:fA14>
<country>Japon</country>
<wicri:noRegion>Nakahara-ku, Kawasaki</wicri:noRegion>
</affiliation>
</author>
<author>
<name sortKey="Hotta, Yoshinobu" sort="Hotta, Yoshinobu" uniqKey="Hotta Y" first="Yoshinobu" last="Hotta">Yoshinobu Hotta</name>
<affiliation wicri:level="1">
<inist:fA14 i1="01">
<s1>FUJITSU LABORATORIES LTD., 1-1 Kamikodanaka 4-chome</s1>
<s2>Nakahara-ku, Kawasaki</s2>
<s3>JPN</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
<sZ>3 aut.</sZ>
<sZ>4 aut.</sZ>
</inist:fA14>
<country>Japon</country>
<wicri:noRegion>Nakahara-ku, Kawasaki</wicri:noRegion>
</affiliation>
</author>
</analytic>
<series>
<title level="j" type="main">Proceedings of SPIE, the International Society for Optical Engineering</title>
<title level="j" type="abbreviated">Proc. SPIE Int. Soc. Opt. Eng.</title>
<idno type="ISSN">0277-786X</idno>
<imprint>
<date when="2011">2011</date>
</imprint>
</series>
</biblStruct>
</sourceDesc>
<seriesStmt>
<title level="j" type="main">Proceedings of SPIE, the International Society for Optical Engineering</title>
<title level="j" type="abbreviated">Proc. SPIE Int. Soc. Opt. Eng.</title>
<idno type="ISSN">0277-786X</idno>
</seriesStmt>
</fileDesc>
<profileDesc>
<textClass>
<keywords scheme="KwdEn" xml:lang="en">
<term>Character recognition</term>
<term>Document management</term>
<term>Document retrieval</term>
<term>Electronic document</term>
<term>Imagery</term>
<term>Information retrieval</term>
<term>Keyword</term>
<term>Optical character recognition</term>
<term>Robustness</term>
<term>Segmentation</term>
</keywords>
<keywords scheme="Pascal" xml:lang="fr">
<term>Imagerie</term>
<term>Mot clé</term>
<term>Reconnaissance optique caractère</term>
<term>Gestion document</term>
<term>Document électronique</term>
<term>Recherche documentaire</term>
<term>Recherche information</term>
<term>Reconnaissance caractère</term>
<term>Segmentation</term>
<term>Robustesse</term>
<term>0130C</term>
<term>4230</term>
</keywords>
<keywords scheme="Wicri" type="topic" xml:lang="fr">
<term>Document électronique</term>
<term>Recherche documentaire</term>
</keywords>
</textClass>
</profileDesc>
</teiHeader>
<front>
<div type="abstract" xml:lang="en">Document management systems have become important because of the growing popularity of electronic filing of documents and scanning of books, magazines, manuals, etc., through a scanner or a digital camera, for storage or reading on a PC or an electronic book. Text information acquired by optical character recognition (OCR) is usually added to the electronic documents for document retrieval. Since texts generated by OCR generally include character recognition errors, robust retrieval methods have been introduced to overcome this problem. In this paper, we propose a retrieval method that is robust against both character segmentation and recognition errors. In the proposed method, the insertion of noise characters and dropping of characters in the keyword retrieval enables robustness against character segmentation errors, and character substitution in the keyword of the recognition candidate for each character in OCR or any other character enables robustness against character recognition errors. The recall rate of the proposed method was 15% higher than that of the conventional method. However, the precision rate was 64% lower.</div>
</front>
</TEI>
<affiliations>
<list>
<country>
<li>Japon</li>
</country>
</list>
<tree>
<country name="Japon">
<noRegion>
<name sortKey="Fujii, Yusaku" sort="Fujii, Yusaku" uniqKey="Fujii Y" first="Yusaku" last="Fujii">Yusaku Fujii</name>
</noRegion>
<name sortKey="Hotta, Yoshinobu" sort="Hotta, Yoshinobu" uniqKey="Hotta Y" first="Yoshinobu" last="Hotta">Yoshinobu Hotta</name>
<name sortKey="Takebe, Hiroaki" sort="Takebe, Hiroaki" uniqKey="Takebe H" first="Hiroaki" last="Takebe">Hiroaki Takebe</name>
<name sortKey="Tanaka, Hiroshi" sort="Tanaka, Hiroshi" uniqKey="Tanaka H" first="Hiroshi" last="Tanaka">Hiroshi Tanaka</name>
</country>
</tree>
</affiliations>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Ticri/CIDE/explor/OcrV1/Data/Main/Exploration
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 000547 | SxmlIndent | more

Ou

HfdSelect -h $EXPLOR_AREA/Data/Main/Exploration/biblio.hfd -nk 000547 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Ticri/CIDE
   |area=    OcrV1
   |flux=    Main
   |étape=   Exploration
   |type=    RBID
   |clé=     Pascal:11-0279163
   |texte=   Robust Keyword Retrieval Method for OCRed Text
}}

Wicri

This area was generated with Dilib version V0.6.32.
Data generation: Sat Nov 11 16:53:45 2017. Site generation: Mon Mar 11 23:15:16 2024